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Abstract: Gallager and Van Voorhis have found optimal prefix-free codes k{K) for a 
random variable K that is geometrically distributed: Pr[K = k] = p{l — p)^ for k > 0. 
We determine the asymptotic behavior of the expected length Ex[^k{K)] of these codes 
as p —)■ 0: 
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Ex[#k(A)] = log 2 - + logs ^og2 + 2 + f { logs - + log2 log2 + 0{p), 


p 


1 


p 


where 

and {z} — z—[z\ is the fractional part of The function f{z) is a periodic function (with 
period 1) that exhibits small oscillations (with magnitude less than 0.005) about an even 
smaller average value (less than 0.0005). 


1. Introduction 

In 1975, Gallager and Van Voorhis [Gl] (bnilding on prior work by Golomb [G2]) 
fonnd optimal prefix-free codes for a geometrically distribnted random variable; that is, a 
random variable K snch that 


Vy[K = k] = p{l - pf (1.1) 

for k > 0, where 0 < p < 1 is a parameter. (This problem is sometimes referred to 
as the “rnn-length encoding” problem, since the nnmber of Os between consecntive Is 
in a seqnence of independent and identically distribnted Bernonlli random variables is 
geometrically distribnted.) Their resnlt shows that the optimal codes k{K) have expected 
codeword length Ex[^k{K)]] close, bnt not eqnal, to the lower bonnd given by the entropy 

Ex[#^(K)] > HiK) 

= - 5^P(1 -P)^l0g2(p(l -p)^) 

A :>0 

, 1 1 — P , X 

= logs - ^-log2(l -P) 

P p 

= log 2 - + log 2 e + 0 (p), (1.2) 

p 

where logs ^ ~ 1.442 .... 

In Section 2, we shall show that 

Ex[i^K{K)] = logs ^ + log 2 ^og2 + 2 + f l^logs ^ + logs log2^ + 0(p), (1.3) 

where the fnnction f{z) is a bonnded periodic fnnction of ^ with period 1. Specifically, 

with {z} ~ z — \_z\ denoting the fractional part of z. The fnnction f{z) exhibits small 
oscillations abont its average valne u = f(z)dz = 4(logs e)(i?i(log2) — i7i(21og2)) — 
3/2 = 0.0004547..., where Ei(y) — f^(e~^/x) dx. It assnmes its largest valne of f{zi) — 
0.004195 ... at ^1 = H-logs log 2—logs = 0.7680 ..., where xi = 0.8140 ... is the smaller 
solntion of the eqnation xe~^ = (logs ^)/4, and its smallest valne of f{zo) — —0.003438 ... 
at 2:0 = 1 + logs log 2 — logs ^0 = 0.1934 ..., where xq = 1.2123 ... is the larger solntion 
of that eqnation. Gomparing this resnlt with (1.2), we see that the average rednndancy of 
the optimal code is logs log 2 -|- 2 -1- a; — logs ^ — 0.02899 .... 
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2. Run-Length Encoding 

According to Gallager and Van Voorhis [Gl], the optimal prefix-free binary codes 
for a geometric random variable AT, distribnted according to (1.1), can be constrncted as 
follows. Set 

m = 

Divide AT by m to obtain a qnotient S' > 0 and a remainder 0 < A < m — 1: 

K = Sm + R. 

The distribntion of S is geometric with parameter g = l — (1— p)™; that is 

Pr[S = s] = (1 — q)q^ 

for s > 0. The distribntion of R is “trnncated geometric”: 

Pr[A = r] = (2.2) 

^ ^ l-(l-p)"^ ^ ^ 

for 0 < r < m — 1. We shall take optimal prefix-free codes (j{S) for S and g{R) for A, and 
concatenate them (as strings) to obtain an optimal code k{K) = (j{S) g{R). 

Since (2.1) implies q > (3-y5)/2, an optimnm prefix-free code for the qnotient S is 
a(S) = I'^O, and the expected length of this code is 

Ex[#^(S)l = Ex[S] + 1=1= (2.3) 

The optimnm prefix-free code for the remainder A is a Hnhman code g{R) (see Hnh- 

man [H]). We observe that the ratio between the smallest and the largest of the probabilities 
given by (2.2) is (1 — p)"^“^. Since (2.1) implies that (1 — p)™-“i > 1/(2 — p) > 1/2, it 
follows that all of the codewords in this H uff man code mnst be of at most two consecntive 
lengths. (If there were a codeword ^ of length i and two codewords ry 0 and ry 1 each of 
length j > i + 2, then the probability of ^ wonld be strictly smaller than the snm of the 
probabilities of ?y0 and rj 1, and the code with rj of length j — 1 and ^ 0 and ^ 1 each of 
length i -\-l wonld have strictly smaller expected length.) Take 

I = [log 2 mj 

and 

h = m — 2\ 


log(2-p) 
- log(l - p) 
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so that 0 < h < 2^ — 1. Then there will be 2^ — h codewords of length I and 2h codewords 
of length / + 1. Since the 2h longer codewords will have the 2h smallest probabilities, the 
expected codeword length for the Hnffman code is 


Ex[#p(i?)] = 1 + 

m—2h<ik<im — l 


(1 — p)^ 

1 - (1 -p)^ 


^ {1 - - {1 - p )^ 

^ 1 - (i-p)^ 


Combining this expression for the expected length of the encoding of R with (2.3) for the 
expected length of the encoding of S', we obtain 


Ex[#^(K)] = Ex[#a(S)] + Ex[#p(i?)] 


^ {1 - - {1 - p )^ 

^ 1 - (l-p)^ 


1 

^ 1 - (l-p)^ 


(2.4) 


for the expected length of the encoding of K. 

Since I and h are dehned in terms of m, we shall start by eliminating them in favor of 


^ = {log2TO}, 


the fractional part of log 2 m. Then I = log 2 m — and from m = 2^ + h we obtain 
1 — 2h/m = 2^~'^ — 1. Snbstitnting these expressions in (2.4) yields 

Ex[#^^(K)] =log2m-^? + ^- ^ + ^ (2.5) 

Since d is dehned in terms of m, onr next step will be to eliminate p in favor of m by nsing 
the relation 

m = — - -h 0(1), (2-6) 

p 

which follows from (2.1) and implies 


p = 


log 2 
m 


+ 0 



Thns 

(l-p)"' = i + o(l-). 

2 \m) 

Snbstitnting this expression in (2.5) yields 


Ex[#^p(it:)] = log 2 m + 2 + /(log 2 m) + 0 



(2.7) 
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where 


The function / is periodic with period 1. It is also continuous (because lim^-j.! f{z) = 
/(O) = 0), and has a continuous derivative (because lim^;-;.! /'(z) = /'(0) = 2(log 2)2-1). 
These properties, combined with the relation 

log 2 m = log 2 ^ + log 2 log 2+ 0{p) 

which follows from (2.6), allow us to deduce 

/(log 2 w) = / |^log 2 ^ + log 2 log2^ + 0{p). 

This in turn allows us to rewrite (2.7) in terms of p as 

Ex[#^^(i^)] = log 2 ^ + log 2 log2 + 2 + / |^log 2 ^ + log 2 log2^ + 0{p). 

It remains to examine the properties of the function f{z). The continuity of the periodic 
function f{z) ensures that it is bounded, and the continuity of its derivative ensures that its 
maxima and minima occur at values of 2 ; for which the derivative vanishes. The vanishing 
of the derivative is given by the equation 

1 = 4-2-2'”’’ ■2^-’^ (log 2)2. 

The substitution x = 2^“^^ log2 reduces this equation to xe~^ = (log2e)/4, from 
which the numerical results mentioned in Section 1 follow. The same substitution 
also reduces the integral oj = f{z) dz for the average value of f{z) to the integral 
4(log2 e)£g'J^(e" /x) dx, leading again to the numerical results mentioned in Section 1. 

We mention in closing that Wolf [W] has shown how to use some of the optimal prehx- 
free codes found by Gallager and Van Voorhis to construct asymptotically optimal nested 
strategies for group testing. For this problem, the lower bound (applying to all strategies, 
nested or not) to the expected number of tests per positive individual has the asymptotic 
behavior indicated in (1.2). Thus the gap between (1.3) and (1.2) represents a bound to 
the possible advantage that non-nested strategies might have over nested ones. 
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